feat(performance): query improvements for trends (load less people) #23135

aspicer · 2024-06-20T22:50:47Z

Problem

MX trends actor queries are failing https://posthog.slack.com/archives/C0368RPHLQH/p1718782039244149

They are OOMing because we have too many persons and PDIs. MX has 50,000,000 PDI entries and 200,000,000 people, and joining events against them takes > 36GB of memory in Clickhouse.

Changes

This adds a new alias for the persons table, called "filterable_persons". This new table looks into the join ON expression, looking for an id IN clause. If it finds one, it brings that inside of the subselect, to prevent us from having to pull all people.

This behavior is similar to the WhereClauseExtractor, but the WhereClauseExtrator is general and conservative. filterable_persons assumes you know what you're doing in the query, and skips the checks. It would be much too hard to expand the WhereClauseExtractor to know that it could bring the entire subselect inside the where.

It achieves reuse by putting a use_query_cache setting with a 600 (HOGQL_INCREASED_MAX_EXECUTION_TIME) second query_cache_ttl on the cache value. This means that if the exact same source query is rerun for the next 10 minutes, the results will be the same, but we shouldn't be allowing people run queries that frequently so it shouldn't be an issue. If it becomes one, we can put some sort of nonce on the CTE to only allow caching in this query.

We also

Set optimize_aggregation_in_order for the subquery with grouping on person_distinct queries. This helps (a lot - 16GB to 1GB) with memory usage in a vacuum, but doesn't matter too much once you try to join against it.

Unrelated but included:

Lets keyboard shortcuts work if the shift key is reporting 'k' as a capital letter

Followups

More generally, we should expand the WhereClauseExtractor and make sure we're using it. This will help inside of event queries where we join against people and then filter on person properties.

How did you test this code?

Wrote tests. Looked at the SQL. Ran a bunch of actors queries with breakdowns and filters locally.

posthog-bot · 2024-06-20T23:07:27Z

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

github-actions · 2024-06-20T23:22:03Z

Size Change: 0 B

Total Size: 1.06 MB

ℹ️ View Unchanged

Filename	Size
`frontend/dist/toolbar.js`	1.06 MB

_{compressed-size-action}

posthog-bot · 2024-06-20T23:23:09Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

…spicer/group_pdi

posthog/hogql/constants.py

posthog/hogql_queries/actors_query_runner.py

posthog/hogql/database/schema/persons.py

Twixes · 2024-06-25T18:14:19Z

posthog/hogql_queries/insights/trends/test/__snapshots__/test_trends.ambr

+     WHERE and(equals(person.team_id, 2), in(id,
+                                               (SELECT source.actor_id AS actor_id
+                                                FROM
+                                                  (SELECT actor_id AS actor_id, count() AS event_count, groupUniqArray(100)(tuple(timestamp, uuid, `$session_id`, `$window_id`)) AS matching_events
+                                                   FROM
+                                                     (SELECT e.person_id AS actor_id, toTimeZone(e.timestamp, 'UTC') AS timestamp, e.uuid AS uuid, e.`$session_id` AS `$session_id`, e.`$window_id` AS `$window_id`
+                                                      FROM events AS e
+                                                      LEFT JOIN
+                                                        (SELECT argMax(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(groups.group_properties, 'industry'), ''), 'null'), '^"|"$', ''), toTimeZone(groups._timestamp, 'UTC')) AS properties___industry, groups.group_type_index AS index, groups.group_key AS key
+                                                         FROM groups
+                                                         WHERE and(equals(groups.team_id, 2), ifNull(equals(index, 0), 0))
+                                                         GROUP BY groups.group_type_index, groups.group_key) AS e__group_0 ON equals(e.`$group_0`, e__group_0.key)
+                                                      WHERE and(equals(e.team_id, 2), equals(e.event, 'sign up'), greaterOrEquals(toTimeZone(e.timestamp, 'UTC'), toDateTime64('2020-01-02 00:00:00.000000', 6, 'UTC')), less(toTimeZone(e.timestamp, 'UTC'), toDateTime64('2020-01-03 00:00:00.000000', 6, 'UTC')), ifNull(equals(e__group_0.properties___industry, 'technology'), 0)))
+                                                   GROUP BY actor_id SETTINGS use_query_cache=1, query_cache_ttl=600) AS source)))


This case of the optimization kicking in seems massive, but seems legit at a glance. I do wonder how fast this is, given the in()'s SELECT is triple-nested. 😅 Probably depends on whether the events time range is large (can be months for an aggregate query) or tiny (a single day). (This is not actionable, I'm just wondering)

#23135 (comment)

This sort of got buried but discussed the performance difference with Marius above. I think this will (almost) always make things faster, because if you have enough events to have the inner queries run slow, you will also have a lot of people for the inefficient join.

If this ends up slowing down small users, we can figure out a way to create a set of "large user optimizations" that maybe only get turned on once people hit some event limit.

This will be much faster once materialized CTEs get landed in Clickhouse.

Twixes · 2024-06-25T18:16:12Z

I like this a lot! Especially without filterable_persons being separate. Though seems like some tests might need to be updated (they aren't running as long as there's a merge conflict).

…spicer/group_pdi

aspicer · 2024-06-25T23:32:31Z

This change dropped memory usage on actors queries from 34 GB to 1 GB for MX. 💪

sentry-io · 2024-06-26T06:37:34Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ SyntaxError: mismatched input '_matching_events' expecting posthog.tasks.tasks.process_query_task View Issue
‼️ ValueError: Missing both funnelStep and funnelCustomSteps posthog.tasks.tasks.process_query_task View Issue
‼️ SyntaxError: mismatched input '_matching_events' expecting posthog.tasks.tasks.process_query_task View Issue
‼️ TypeError: dateutil.relativedelta.relativedelta() argument after ** must be a mapping, not NoneType posthog.tasks.tasks.process_query_task View Issue
‼️ ValueError: Missing both funnelStep and funnelCustomSteps posthog.tasks.tasks.process_query_task View Issue

_{Did you find this useful? React with a 👍 or 👎}

…23135)

aspicer added 2 commits June 20, 2024 14:38

group pdi

22c538d

fixes

07ac16a

aspicer changed the title ~~feat(performance): query improvements for trends~~ feat(performance): query improvements for trends (load less people) Jun 20, 2024

github-actions bot added 2 commits June 20, 2024 23:00

Update query snapshots

bf33493

Update UI snapshots for chromium (2)

160b1c5

Update query snapshots

ee3b5dc

Update UI snapshots for chromium (2)

1878fbc

aspicer and others added 9 commits June 20, 2024 16:33

person ids

50964dc

fix

d377853

Update UI snapshots for chromium (2)

5da1b47

fix mypy

14c0caf

fix bug

35e2cb2

Update UI snapshots for chromium (2)

f2b3926

Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…

6486f1e

…spicer/group_pdi

Update UI snapshots for chromium (2)

4ca7bbd

Update query snapshots

ae94436

aspicer marked this pull request as ready for review June 21, 2024 00:15

github-actions bot and others added 5 commits June 21, 2024 00:15

Update query snapshots

9149173

Update query snapshots

c76a96a

don't cache if not using cte

cb5af95

Update query snapshots

64f160f

Merge branch 'aspicer/group_pdi' of github.com:PostHog/posthog into a…

814766d

…spicer/group_pdi

aspicer requested review from a team and mariusandra June 21, 2024 00:19

aspicer and others added 3 commits June 20, 2024 17:21

Merge branch 'master' into aspicer/group_pdi

a1266e8

Update UI snapshots for chromium (2)

e61b9fc

Update query snapshots

a58194c

aspicer added 3 commits June 25, 2024 10:31

tests and cleanup

f9eb633

don't allow no table name

38116a4

don't make a new table if no promotions

e3e7f27